Skip to content

backlog: P1 — fresh-session quality research (Aaron 2026-04-23)#163

Merged
AceHack merged 2 commits intomainfrom
backlog/fresh-session-quality-research
Apr 23, 2026
Merged

backlog: P1 — fresh-session quality research (Aaron 2026-04-23)#163
AceHack merged 2 commits intomainfrom
backlog/fresh-session-quality-research

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented Apr 23, 2026

Summary

Adds a P1 BACKLOG row capturing Aaron's 2026-04-23 observation that fresh Claude Code sessions operate at noticeably lower quality than resumed sessions, and proposing research to close the gap.

"i tried a fresh session instead of resuming form the existing, its not as goona, maybe do some research on yourself on how to make sure fresh cluade sessions are as good as you, backlog item"

Why this is P1

Fresh-session quality is a scaling property — a factory whose resumed sessions are excellent but whose fresh sessions are mediocre doesn't transplant to new maintainers cleanly. Max is anticipated as the next human maintainer per CURRENT-aaron.md; his fresh-session experience is the benchmark.

Candidate causes to investigate

  1. Context-accumulation compounding (resumed has reasoning in window that MEMORY.md doesn't capture)
  2. Prompt-cache warmth (fresh pays cold-start repeatedly)
  3. Per-session calibration loss (mid-session directive shifts don't survive)
  4. CURRENT-<maintainer>.md coverage gaps (the fast-path is meant exactly for this)
  5. Soulfile-as-substrate as the real fix (per docs/research/soulfile-staged-absorption-model-2026-04-23.md compile-time ingest)

Deliverables

  1. Diagnostic protocol — benchmark fresh vs resumed on known-good prompts
  2. Gap analysis vs AutoMemory + AutoDream Anthropic features
  3. Factory-overlay recommendations (CURRENT-file improvements, migration discipline, soulfile compile-time design)
  4. Research doc landing under docs/research/fresh-vs-resumed-session-quality-gap-YYYY-MM-DD.md

Self-scheduled free work under the 2026-04-23 scheduling-authority rule.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 23, 2026 16:24
AceHack added a commit that referenced this pull request Apr 23, 2026
…+ Overlay A #4 (PR #162)

Two PRs this tick, both self-scheduled free work per the
2026-04-23 scheduling-authority rule:

- PR #162 — Overlay A #4: external-signal-confirms-internal-
  insight discipline migrated per-user → in-repo
- PR #163 — P1 BACKLOG row for fresh-session quality research
  (Aaron 2026-04-23 directive)

Queue now 1 remaining Overlay A migration
(semiring-parameterized-zeta). Fresh-session gap research
cites soulfile-staged-absorption (PR #156) as the designed
fix; research would validate that thesis.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new P1 BACKLOG item to track research into why “fresh” Claude Code sessions appear to perform worse than resumed sessions, and to define candidate causes + deliverables for closing that gap.

Changes:

  • Adds a P1 BACKLOG row describing the fresh-vs-resumed session quality gap
  • Enumerates candidate causes and concrete deliverables for a research write-up
  • Adds priority/scope/effort framing for scheduling and planning

Comment thread docs/BACKLOG.md Outdated
Comment thread docs/BACKLOG.md Outdated
Comment thread docs/BACKLOG.md Outdated
@AceHack AceHack enabled auto-merge (squash) April 23, 2026 17:05
AceHack and others added 2 commits April 23, 2026 13:29
Aaron 2026-04-23: "i tried a fresh session instead of
resuming form the existing, its not as goona, maybe do some
research on yourself on how to make sure fresh cluade
sessions are as good as you, backlog item".

Research-grade row capturing:
- Observed phenomenon (resumed > fresh quality)
- 5 candidate causes (context compounding / prompt cache /
  calibration loss / CURRENT-<maintainer>.md gaps /
  soulfile-as-substrate as real fix)
- 4 deliverables (diagnostic protocol / AutoMemory gap
  analysis / factory-overlay recommendations / research
  write-up)
- P1 because scaling property (fresh sessions ≈ transplant
  to new maintainers like Max)

Self-scheduled free work under the 2026-04-23 scheduling-
authority rule.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two changes on the fresh-session-quality branch:

1. Address PR #163 Copilot review findings:
   - soulfile-staged-absorption doc reference clarified as
     "landing via PR #156" (not in-tree yet at review time)
   - CURRENT-aaron.md clarified as per-user memory (not
     in-repo)
   - 2026-04-23 scheduling-authority rule clarified as
     captured in per-user memory (not in-repo)

2. Add P3 row for Rational Rose research per maintainer
   2026-04-23: "backlog rational rose research low priority".
   Low-priority research pointer on the UML
   model-as-source-of-truth lineage; no commitment to
   adopt; composes with the factory's OpenSpec + formal-
   spec discipline. Effort S for first-pass note.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@AceHack AceHack force-pushed the backlog/fresh-session-quality-research branch from e5b1dba to d54b96f Compare April 23, 2026 17:29
AceHack added a commit that referenced this pull request Apr 23, 2026
…filed

PR #163 (fresh-session-quality research BACKLOG): 3 Copilot
findings on references-to-not-yet-merged / references-to-
per-user-memory. Fixed at source; 3 threads resolved; rebased.

New P3 row: Rational Rose research (Aaron 2026-04-23 low-
priority directive) — UML model-as-source-of-truth lineage;
research pointer; no adopt commitment.

Both landed on PR #163's branch (same BACKLOG.md edits).

4 session PRs merged; 3 armed; 12 still open.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@AceHack AceHack merged commit 3e884ca into main Apr 23, 2026
10 checks passed
@AceHack AceHack deleted the backlog/fresh-session-quality-research branch April 23, 2026 17:31
AceHack added a commit that referenced this pull request Apr 23, 2026
Aaron: "backlog is uml modeling useful for the factory and
what tools would it require us map?"

Filed as P3 row with two-question research pointer (utility
vs existing OpenSpec + formal-spec discipline; tooling-map
for factory-technology-inventory). First-pass recommendation:
Mermaid as factory-aligned default (git-native, zero
toolchain). Auto-merge armed.

Adjacent to Rational Rose P3 row (PR #163) — both will sit
together on merge; row #54 first firing likely flags for
consolidation consideration.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request Apr 23, 2026
Aaron 2026-04-23: "backlog is uml modeling useful for the
factory and what tools would it require us map?"

Two-question research pointer:
1. Utility — does UML add value on top of OpenSpec + formal
   specs (TLA+ / Lean / Z3 / FsCheck / Alloy)?
2. Tooling-map — if we adopt, what tools would the factory
   inventory (PlantUML / Mermaid / draw.io / Structurizr /
   Rational Rose lineage)?

Composes with:
- Rational Rose P3 row (adjacent when PR #163 merges)
- docs/FACTORY-TECHNOLOGY-INVENTORY.md (PR #170 target)
- OpenSpec workflow (spec-as-source-of-truth already in
  place)
- Formal-spec stack

First-pass recommendation (to validate): Mermaid is the
factory-aligned default (git-native, zero toolchain, GitHub
renders natively); heavy UML tools likely over-scoped.

Research note under docs/research/uml-modelling-for-the-
factory-YYYY-MM-DD.md when prioritised. No adopt commitment.
No deadline. Effort S first-pass; M if adopting.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request Apr 23, 2026
…#173)

Aaron 2026-04-23: "backlog is uml modeling useful for the
factory and what tools would it require us map?"

Two-question research pointer:
1. Utility — does UML add value on top of OpenSpec + formal
   specs (TLA+ / Lean / Z3 / FsCheck / Alloy)?
2. Tooling-map — if we adopt, what tools would the factory
   inventory (PlantUML / Mermaid / draw.io / Structurizr /
   Rational Rose lineage)?

Composes with:
- Rational Rose P3 row (adjacent when PR #163 merges)
- docs/FACTORY-TECHNOLOGY-INVENTORY.md (PR #170 target)
- OpenSpec workflow (spec-as-source-of-truth already in
  place)
- Formal-spec stack

First-pass recommendation (to validate): Mermaid is the
factory-aligned default (git-native, zero toolchain, GitHub
renders natively); heavy UML tools likely over-scoped.

Research note under docs/research/uml-modelling-for-the-
factory-YYYY-MM-DD.md when prioritised. No adopt commitment.
No deadline. Effort S first-pass; M if adopting.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request Apr 23, 2026
…tive + first test

First Otto-attributed tick. Three directive absorptions:

(a) Loop agent named Otto, role Project Manager per Aaron
    2026-04-23 directive. Otto IS Claude-in-autonomous-loop-
    without-a-persona-hat; sibling to Kenji/Aarav/etc.
    Not a new SKILL.md. Prior "unnamed-default (loop-agent)"
    attributions (Showcase, Anima) reattribute to Otto.

(b) Claude Cowork fact-check: Google hallucinated `-w`
    workstream mode. Real flag is `--worktree` (git worktree
    isolation). Cowork is a separate Anthropic product
    (Claude Desktop / web), not a CLI mode. `/loop` already
    inherits all harness features. No restart needed.

(c) NSA (New Session Agent) persona = first-class directive.
    Extends PR #163 passive monitoring → active testing.
    5-prompt test set, 3 configurations (baseline /
    NSA-default / NSA-worktree), 5 metrics.

First NSA test run same-tick: `claude -p --model haiku-4-5`
cold-start query found Zeta project identity correctly but
FAILED to find Otto — gap identified (MEMORY.md had no
pointer to new per-user memories). Fixed same-tick.
Concrete demonstration: NSA testing catches substrate gaps
that current-session agents miss.

Attribution: Otto (loop-agent PM hat) for hat-less work.
No persona hats worn this tick.

Per-user memories filed:
- project_loop_agent_named_otto_role_project_manager_2026_04_23.md
- reference_claude_code_w_flag_is_worktree_not_workstream_cowork_is_separate_product_2026_04_23.md
- feedback_new_session_agent_persona_first_class_experience_test_fresh_sessions_including_worktree_2026_04_23.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants